Skip to content

feat: dashboard improvements — admin infra, table reactivity, page separation#393

Merged
lalalune merged 30 commits intodevfrom
sol/dashboard-improvements-20260316
Mar 16, 2026
Merged

feat: dashboard improvements — admin infra, table reactivity, page separation#393
lalalune merged 30 commits intodevfrom
sol/dashboard-improvements-20260316

Conversation

@0xSolace
Copy link
Collaborator

Summary

Dashboard improvements covering admin infrastructure, table reactivity fixes, page separation, and agent detail redesign.

Changes

🆕 Admin Infrastructure Dashboard (+1004 lines)

  • New admin API endpoint (/api/v1/admin/infrastructure) for SSH node inspection
  • Infrastructure service with container health classification (healthy/failed/missing/warming/stale)
  • Unit tests — 5 tests, all passing

🐛 Table Reactivity Fix (+470 / -239)

  • Major rewrite of milady-sandboxes-table: local state management + optimistic updates
  • Added onDataRefresh callback to useSandboxListPoll for live table updates
  • Create dialog onCreated callback triggers table refresh
  • Eliminates the stale-data-after-action bug (no more manual page reload needed)

✨ Page Separation (+48 / -63)

  • Containers page: now shows only custom Docker containers
  • Milady page: dedicated managed agents view
  • New milady-page-wrapper component for consistent header styling

✨ Agent Detail Page (+218 / -366)

  • Redesigned layout and UX
  • Cleaner information hierarchy, net reduction in code

📝 Documentation

  • Added MILADY_CLOUD_ROADMAP.md — living roadmap for infrastructure, provisioning, dashboard, and platform goals

Testing

  • bunx @biomejs/biome check — ✅ all 10 files clean
  • bun test packages/tests/unit/admin-infrastructure.test.ts — ✅ 5/5 passing

Commits (5 logical commits)

  1. feat: admin infrastructure dashboard API + service
  2. fix: table reactivity — optimistic updates, client-side data refresh
  3. feat: separate milady instances and containers pages
  4. feat: agent detail page improvements
  5. docs: add milady cloud roadmap

- New useSandboxStatusPoll hook: polls individual agent status every 5s
  during pending/provisioning, auto-stops on terminal states
- New useSandboxListPoll hook: polls agent list every 10s while any
  sandbox is active, fires toast on running transition
- Create dialog stays open during provisioning with 4-step progress
  stepper (created → database → container → running), elapsed timer,
  retry button on error, and Open Web UI button on completion
- Status cell in table now shows pulse animation for active states,
  scale-in animation for running transitions, shake for errors
- Added shake and scaleIn keyframe animations to globals.css
- New admin infrastructure API endpoint (app/api/v1/admin/infrastructure/route.ts)
- Admin infrastructure service with SSH node inspection and container health classification
- Unit tests for container health classification (5 tests, all passing)
- Classifies containers as healthy, failed, missing, warming, or stale based on
  runtime state, control plane records, and heartbeat freshness
- Major rewrite of milady-sandboxes-table: local state management, optimistic
  updates, no more stale data after start/stop/destroy actions
- Added onDataRefresh callback to useSandboxListPoll hook for live table updates
- Create dialog improvements with onCreated callback to trigger table refresh
- Eliminates the stale-data-after-action problem that required manual page reload
- Containers page now shows only custom Docker containers
- Milady page is now a dedicated managed agents view
- New milady-page-wrapper component for consistent page header styling
- Clear separation of concerns between managed agents and raw containers
- Redesigned agent detail page layout and UX
- Cleaner information hierarchy and visual presentation
@vercel
Copy link

vercel bot commented Mar 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
eliza-cloud-v2 Ready Ready Preview, Comment Mar 16, 2026 10:20pm

@coderabbitai
Copy link

coderabbitai bot commented Mar 16, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 21fd0eb6-320a-425e-85a9-df71b85ef95c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sol/dashboard-improvements-20260316
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Mar 16, 2026

Code Review

Overall this is a solid, well-structured PR. The optimistic update pattern, page separation, and agent detail redesign are all improvements. Here are the issues worth addressing before merge.


🔴 High — Auth errors return 500 instead of 401/403

app/api/v1/admin/infrastructure/route.ts

requireAdmin throws AuthenticationError / ForbiddenError for unauthenticated/non-admin requests, but the call is outside the try/catch. Those exceptions fall into the outer catch and return HTTP 500 rather than 401/403.

export async function GET(request: NextRequest) {
  // requireAdmin can throw — not wrapped in try/catch
  const { role } = await requireAdmin(request);
  if (role !== "super_admin") { ... 403 ... }

  try {
    // only THIS block is protected
    const snapshot = await getAdminInfrastructureSnapshot();
    ...
  } catch (error) { ... 500 ... }
}

Fix: wrap requireAdmin in its own try/catch (matching the pattern in other admin routes) or move it inside the main try block with error discrimination.


🔴 High — Unbounded parallel SSH connections; function timeout risk

packages/lib/services/admin-infrastructure.ts

Promise.all(nodes.map(inspectNodeRuntime)) fires one SSH session per node simultaneously with no concurrency cap. Each session can run up to 5 parallel commands with SSH_COMMAND_TIMEOUT_MS = 15_000. On any meaningful cluster size this risks hitting Vercel's 30-second function timeout and saturating the connection pool.

Recommend adding a concurrency limiter (e.g. p-limit) capping concurrent SSH sessions to 3–5, plus a short-lived cache (even 30-second TTL) to prevent parallel identical fetches from accidental double-loads or future UI polling.


🟠 Medium — buildResourceAlert — warning and critical produce identical strings

packages/lib/services/admin-infrastructure.ts

function buildResourceAlert(label: string, percent: number | null): string | null {
  if (percent === null) return null;
  if (percent >= NODE_RESOURCE_CRITICAL_PCT) return `${label} ${percent}% used`;
  if (percent >= NODE_RESOURCE_WARNING_PCT) return `${label} ${percent}% used`;  // identical!
  return null;
}

Both branches produce the same string, so callers can't distinguish warning from critical. The incident-generation logic only fires on CRITICAL_PCT, meaning warning-level resource pressure never produces an incident — silently dropped. Either embed the severity in the returned string or return a structured value.


🟠 Medium — parseDockerHealth has fragile substring ordering

packages/lib/services/admin-infrastructure.ts

if (normalized.includes("unhealthy")) return "unhealthy";
if (normalized.includes("healthy"))   return "healthy";   // "unhealthy" contains "healthy"

This happens to work because "unhealthy" is checked first, but it's a foot-gun. Also, "health: starting" is a redundant check since "starting" later in the chain covers it. Prefer === "healthy" / === "unhealthy" after stripping the "health: " prefix, or use a startsWith check.


🟠 Medium — useSandboxListPoll skips the initial poll

packages/lib/hooks/use-sandbox-status-poll.ts

useSandboxStatusPoll calls void poll() before setting the interval, but useSandboxListPoll only sets the interval — the first fetch fires 10 seconds after mount. Components using this hook show stale data for the first 10 s.


🟠 Medium — mergeApiData silently drops optimistically-added agents

packages/ui/src/components/containers/milady-sandboxes-table.tsx

mergeApiData returns apiAgents.map(...) — exactly the agents the API returned. If the API hasn't yet reflected a newly created agent (created optimistically), that local entry is silently dropped on the next poll. The optimistic delete is correctly reverted on failure; the optimistic add has no equivalent protection.


🟡 Low — Three redundant data fetches on provisioning completion

packages/ui/src/components/containers/create-milady-sandbox-dialog.tsx

On provisioning completion the sequence fires: (1) router.refresh() from the useEffect watching status === "running", (2) router.refresh() from handleClose, (3) onCreatedrefreshData in the parent. That's three overlapping full refreshes. router.refresh() should only be called once, at close.


🟡 Low — sshUser / sshPort unnecessarily exposed in API response

The AdminInfrastructureNode response includes sshUser and sshPort. This endpoint is super-admin only, but infrastructure metadata in a JSON API response can end up in logs, caches, or forwarded requests. Worth stripping or gating behind an explicit ?debug=true query param.


🟡 Low — "unknown" live health variant declared but never emitted

packages/lib/services/admin-infrastructure.ts

ContainerLiveHealthStatus includes "unknown" but classifyContainerHealth never returns it (the final fallthrough always returns "healthy"). Either remove the variant or add a true fallback.


🧪 Test coverage gaps

packages/tests/unit/admin-infrastructure.test.ts

The 5 tests cover the happy path. Missing branches:

  • dbStatus === "stopped" with and without a runtime present
  • runtime.state === "dead" / "exited" / "restarting" / "created"
  • runtime.health === "starting"
  • Heartbeat exactly at the HEARTBEAT_WARNING_MINUTES (5 min) boundary
  • dbStatus === "error" and "disconnected" with a running runtime

No tests at all for the parsing helpers (parseDockerHealth, parseMemoryPercent, parsePercent, parseRuntimeContainers) or the incident/summary aggregation in getAdminInfrastructureSnapshot. The |-delimiter parsing in parseRuntimeContainers in particular is easy to break silently.


✅ What's good

  • Auth double-check (session + super_admin role) in the API route is correct.
  • SSH commands use hard-coded strings — no injection risk from dynamic input.
  • Optimistic delete with revert-on-failure is implemented correctly.
  • Agent type annotation Awaited<ReturnType<typeof miladySandboxService.getAgent>> is the right pattern.
  • force-dynamic on the admin route is appropriate.
  • Biome and unit tests passing is a good baseline.

🤖 Generated with Claude Code

@claude
Copy link

claude bot commented Mar 16, 2026

PR Review — #393: Dashboard improvements

Overall this is a well-structured PR with good commit separation and test coverage for the happy path. Below are issues found across security, correctness, performance, and code quality.


🔴 HIGH — Auth error outside try/catch

app/api/v1/admin/infrastructure/route.ts L15

const { role } = await requireAdmin(request);  // outside try/catch
if (role !== "super_admin") { ... }

try {
  const snapshot = await getAdminInfrastructureSnapshot();
  ...
} catch (error) { ... }

If requireAdmin throws (e.g. DB error resolving the session), the exception escapes the try/catch entirely. Next.js will return a 500 with a potentially leaky stack trace. The requireAdmin call should be inside the try/catch, or at minimum follow the same guard pattern used in other admin routes.


🔴 HIGH — No overall timeout on infrastructure snapshot → Vercel 504

packages/lib/services/admin-infrastructure.tsgetAdminInfrastructureSnapshot()

SSH_CONNECT_TIMEOUT_MS (10s) + 5 × SSH_COMMAND_TIMEOUT_MS (75s) = 85s worst-case per hanging node. Promise.all across N nodes caps the wall time at 85s, but Vercel's 10-second default serverless timeout will kill the request first, returning a 504 to the client with no graceful error. Wrap inspectNodeRuntime in a Promise.race against a deadline, or add an AbortController-based overall cap to getAdminInfrastructureSnapshot().


🟡 MEDIUM — SSH coordinates in API response

packages/lib/services/admin-infrastructure.tsAdminInfrastructureNode

The full node shape including sshPort, sshUser, and hostname is returned as-is to the client. This is a super-admin-only endpoint so the risk is mitigated, but if this role check is ever weakened these fields directly expose SSH access coordinates for all Docker nodes. Consider omitting them from the response shape or documenting the explicit decision to include them.


🟡 MEDIUM — useSandboxListPoll missing initial poll

packages/lib/hooks/use-sandbox-status-poll.ts

useSandboxStatusPoll fires void poll() immediately before setInterval. useSandboxListPoll does not — it goes straight to setInterval, so on first mount the table won't receive fresh data for up to 10 seconds. The symmetry with the single-agent variant looks unintentional.


🟡 MEDIUM — mergeApiData drops agents absent from API response

packages/ui/src/components/containers/milady-sandboxes-table.tsx

return apiAgents.map((agent) => { ... });

The merge maps over apiAgents, discarding any localSandboxes entries not present in the response. If the API ever returns a partial list (pagination added later, race where a just-created sandbox isn't visible yet), table rows will silently disappear. The merge should be a union: update matched entries, preserve unmatched localSandboxes entries.


🟡 MEDIUM — previousStatusesRef overwritten after poll completes

packages/lib/hooks/use-sandbox-status-poll.ts

The effect that syncs previousStatusesRef from the sandboxes prop runs after every render where sandboxes changes. If a poll finishes and updates previousStatusesRef, then a mergeApiData state update triggers a render and sandboxes changes, this effect will overwrite the ref with older values from props — potentially causing onTransitionToRunning to re-fire for an already-handled transition on the next poll.


🟡 MEDIUM — useSandboxStatusPoll never stops on 4xx responses

packages/lib/hooks/use-sandbox-status-poll.ts

On a non-2xx response the hook sets an error string but never calls cleanup(). A 404 (agent deleted) or 401 means the hook polls a gone/unauthorized resource indefinitely until the component unmounts. The poll should stop (or backoff with a limit) on persistent HTTP errors.


🟡 MEDIUM — No caching on the snapshot endpoint

packages/lib/services/admin-infrastructure.ts

Every GET to /api/v1/admin/infrastructure SSHes into every node, runs 5 commands per node, and queries two DB tables. A super admin rapidly refreshing can pile up concurrent SSH sessions to all Docker nodes. A simple in-memory cache with a 30–60s TTL would be appropriate here.


🟡 MEDIUM — Raw SSH error messages exposed in API response

packages/lib/services/admin-infrastructure.tsinspectNodeRuntime() catch block

The raw error message from the SSH client is included in node.runtime.error in the API response. If the SSH client throws with a private key path, internal hostname, or other system info, it will be visible in the admin UI. Sanitize or categorize the error before returning it.


🟡 MEDIUM — Optimistic "provisioning" status not reverted synchronously

packages/ui/src/components/containers/milady-sandboxes-table.tsxhandleProvision()

On fetch failure, void refreshData() is called to revert the optimistic state. But refreshData is itself async and its catch swallows errors — if the network is down the row stays stuck at "provisioning" indefinitely. A synchronous revert (setLocalSandboxes(prev => ...)) before the fetch attempt is safer.


🟠 LOW — buildResourceAlert loses severity distinction

packages/lib/services/admin-infrastructure.ts L1400–1404

if (percent >= NODE_RESOURCE_CRITICAL_PCT) return `${label} ${percent}% used`;
if (percent >= NODE_RESOURCE_WARNING_PCT) return `${label} ${percent}% used`;

Both critical and warning return the same string format. The node alerts list cannot distinguish severity, making critical CPU/memory indistinguishable from a warning in the UI.


🟠 LOW — docker ps --filter name=milady- is a prefix match

packages/lib/services/admin-infrastructure.tsinspectNodeRuntime()

docker ps --filter name=milady- matches any container whose name starts with milady-, not exact matches. Non-managed containers with this prefix on the host would appear as ghost containers and inflate counts. Consider using --filter name=^/milady- (anchored) or comparing against the known agent ID list.


🟠 LOW — router.refresh() called twice on provisioning complete

packages/ui/src/components/containers/create-milady-sandbox-dialog.tsx

When an agent reaches "running" and the dialog is then closed, router.refresh() fires from the status-watching effect and from handleClose() — two back-to-back RSC re-fetches for the same event.


🟠 LOW — Duplicate status color maps in two files

STATUS_BADGE_COLORS / STATUS_DOT_COLORS are defined near-identically in both milady-sandboxes-table.tsx and app/dashboard/containers/agents/[id]/page.tsx. Extract to a shared constants file to prevent drift.


🟠 LOW — agentId={createdAgentId!} non-null assertion

packages/ui/src/components/containers/create-milady-sandbox-dialog.tsx

The component is only rendered in the provisioning phase after createdAgentId is set, so this is logically safe, but the assertion is fragile. A conditional render guard is cleaner.


📝 Test coverage gaps

The 5 unit tests cover only classifyContainerHealth. Notable untested paths:

  • parseRuntimeContainers — pipe-delimited parsing edge cases (empty output, malformed lines, pipe chars in fields)
  • buildNodeAlerts / buildResourceAlert — the severity-loss bug above would be caught by one test
  • Ghost-container detection and allocation-drift logic in getAdminInfrastructureSnapshot
  • useSandboxListPoll — the transition-detection logic and onTransitionToRunning callback
  • classifyContainerHealth with dbStatus === "stopped" and runtime !== null (the degraded path)

Summary

Priority Issue
🔴 HIGH Auth error outside try/catch in new route
🔴 HIGH No overall timeout → Vercel 504 on SSH hang
🟡 MEDIUM SSH coordinates (host/port/user) exposed in API response
🟡 MEDIUM useSandboxListPoll missing initial poll
🟡 MEDIUM mergeApiData drops agents absent from API response
🟡 MEDIUM previousStatusesRef overwritten after poll update
🟡 MEDIUM Polling never stops on 404/401 responses
🟡 MEDIUM No caching on infrastructure snapshot
🟡 MEDIUM Raw SSH errors in API response
🟠 LOW Optimistic revert not synchronous
🟠 LOW buildResourceAlert loses severity
🟠 LOW docker ps prefix match instead of exact
🟠 LOW Double router.refresh() on dialog close
🟠 LOW Duplicate status color maps
📝 Test coverage for parsing, ghost detection, polling

The two HIGH items (auth error escaping the try/catch and the missing overall SSH timeout) should be addressed before merge.

Reviewed by Claude Sonnet 4.6

Collapse multi-line <p> tag to single line per biome rules.
Two issues fixed:

1. Fire-and-forget calls in AgentBudgetService.deductBudget() for
   sendLowBudgetAlert() and triggerAutoRefill() used bare 'void'
   without .catch(), causing unhandled promise rejections that
   crashed the test runner.

2. Email template loadTemplate() used process.cwd()-relative path
   (lib/email/templates/) but templates live under packages/lib/.
   Switched to import.meta.dir-relative resolution for reliability.

These bugs are pre-existing on dev branch.
- Fix unused variable warnings by prefixing with underscore
- Fix import ordering and formatting in route.ts
@claude
Copy link

claude bot commented Mar 16, 2026

Code Review

Overall this is a well-structured PR with a clear scope. The optimistic updates and local state management are a real improvement, and the provisioning progress dialog is a nice UX addition. A few issues worth addressing before merge:


🔴 Bugs

1. Double refresh on dialog close (create-milady-sandbox-dialog.tsx)

handleClose calls both onCreated?.() (which the parent wires to refreshData) and router.refresh() unconditionally. When the user closes after provisioning completes, both run simultaneously:

function handleClose() {
  setOpen(false);
  setTimeout(resetForm, 300);
  void onCreated?.();  // triggers refreshData() in parent
  router.refresh();   // also refreshes
}

onCreated is intended to push a client-side data refresh, while router.refresh() forces a full server re-render. These conflict — the server props will overwrite the freshly-merged client state. Since local state is now the source of truth, drop the router.refresh() from handleClose (or only call it when onCreated is not provided).

2. useSandboxListPoll fires no initial poll — 10s delay before first update

The hook only starts the interval; it never runs a poll immediately when it mounts. A user who creates an agent and closes the dialog waits 10 seconds before seeing any live data. Add an initial void poll() call inside the effect (same pattern as useSandboxStatusPoll already does correctly):

// immediately after setting isPolling(true):
void poll();
intervalRef.current = setInterval(() => void poll(), intervalMs);

3. milady/page.tsx drops the error boundary around listAgents

The old containers/page.tsx wrapped miladySandboxService.listAgents in a try/catch because the table may not exist in all environments. The new milady/page.tsx removed that guard entirely:

// new — will throw in environments without the milady_sandboxes table
const sandboxes = await miladySandboxService.listAgents(user.organization_id);

Restore the try/catch (or the // Table may not exist comment) to keep parity.


🟡 Performance / Security

4. No caching on the admin infrastructure endpoint

Every GET /api/v1/admin/infrastructure call SSHs into every Docker node synchronously. With the defaults (SSH_CONNECT_TIMEOUT_MS = 10_000, SSH_COMMAND_TIMEOUT_MS = 15_000), a single bad node can add up to 25s of latency. Three dead nodes = ~75s+ with Promise.all capped only by the slowest.

Three unused constants already hint at the intended solution that wasn't shipped:

const _SNAPSHOT_CACHE_TTL_MS = 30_000;        // unused
const _MAX_CONCURRENT_SSH_SESSIONS = 5;        // unused
const _NODE_INSPECTION_TIMEOUT_MS = 25_000;    // unused

At minimum, wrap getAdminInfrastructureSnapshot in a module-level cache keyed by timestamp (30s TTL), or add a force-dynamic note that this endpoint is intentionally slow and should only be called from the admin dashboard on demand. The endpoint should probably also not be polled automatically from the client without caching on the server side.

5. initialSandboxIdsRef comparison is order-sensitive

const newIds = initialSandboxes.map((sb) => sb.id).join(",");

If the server returns the same set of IDs in a different order (e.g. after a sort change), this won't detect the equality and will reset local state unnecessarily. Use a Set comparison or sort before joining.


🔵 Code Quality

6. Status color maps duplicated across files

STATUS_BADGE_COLORS and STATUS_DOT_COLORS are defined identically in both milady-sandboxes-table.tsx and agents/[id]/page.tsx. These should live in a shared util (e.g. packages/lib/constants/sandbox-status.ts) and be imported in both places. Same for the formatRelative/formatRelativeShort helpers — very similar implementations in two files.

7. void onCreated?.() swallows async errors

onCreated is typed as () => void | Promise<void>. Calling it with void will silently discard any rejection. Use:

onCreated?.()?.catch((err) => logger.error("onCreated callback failed", { error: String(err) }));

8. createdAt: toIso(container.createdAt) ?? refreshedAt in admin-infrastructure.ts

Using refreshedAt as a fallback for a missing created_at is semantically wrong — it would display the snapshot time as the creation time in the UI. The column is NOT NULL in the schema so this fallback should never be needed; but if it could be null, "unknown" or null is a better sentinel than the current timestamp.


✅ Good Catches

  • agent-budgets.ts: replacing void this.triggerAutoRefill(...) with explicit .catch() logging is a real improvement — silent fire-and-forget errors are hard to track down.
  • template-renderer.ts: switching from process.cwd() to import.meta.dir is the correct Bun idiom for package-relative path resolution.
  • Optimistic updates with rollback on provision/suspend/delete are well-implemented.
  • Unit tests for classifyContainerHealth cover the important state machine edges.

- admin-infrastructure: add concurrency limiter (max 5 parallel SSH sessions),
  per-node inspection timeout (25s), and 30s in-memory snapshot cache
- admin-infrastructure: differentiate warning vs critical in buildResourceAlert
- admin-infrastructure: make parseDockerHealth more robust (strip prefix, exact match)
- admin-infrastructure: remove unused 'unknown' ContainerLiveHealthStatus variant
- use-sandbox-status-poll: stop polling on persistent 4xx errors or 5 consecutive failures
- use-sandbox-status-poll: fire initial poll immediately in useSandboxListPoll
- use-sandbox-status-poll: fix previousStatusesRef to preserve poll-derived statuses
- milady-sandboxes-table: union merge in mergeApiData to preserve optimistic additions
- create-milady-sandbox-dialog: deduplicate triple data refresh to single onCreated path
@claude
Copy link

claude bot commented Mar 16, 2026

test

@claude
Copy link

claude bot commented Mar 16, 2026

Eliza Cloud V2

Stack

  • Runtime: Bun
  • Framework: Next.js 15 (App Router)
  • Database: PostgreSQL + Drizzle ORM
  • Deployment: Vercel Serverless
  • UI: React + Tailwind CSS

Commands

bun install          # Install dependencies
bun run dev          # Start dev server
bun run build        # Production build
bun run db:migrate   # Apply database migrations
bun run db:generate  # Generate migration from schema
bun run db:studio    # Open Drizzle Studio

Database Migrations

Never use db:push - it's removed. All schema changes go through migrations.

Schema Change Workflow

  1. Edit schema in db/schemas/
  2. bun run db:generate
  3. Review SQL in db/migrations/
  4. bun run db:migrate
  5. Commit both schema + migration

Custom Migrations

npx drizzle-kit generate --custom --name=descriptive_name

Rules

  • No CREATE INDEX CONCURRENTLY (runs in transaction)
  • Use IF NOT EXISTS / IF EXISTS for creating tables
  • Never edit applied migrations
  • NEVER use omnibus migrations that recreate the full schema or existing objects - they will fail in production by locking active tables. Instead:
    1. Create small targeted migrations that ONLY add your new schema objects
    2. Use separate migrations for data backfills
    3. Put cleanup/drops in their own migration
    4. Group related objects together but limit migrations to <100 lines
  • See docs/database-migrations.md for details

Type Checking

bun run check-types has many pre-existing errors across the codebase (db/, lib/services/, app/). Don't try to fix them all — only verify your changed files have no new errors. Filter output:

bun run check-types 2>&1 | grep -E "(your-file\.ts|your-other-file\.ts)"

If the grep returns empty, your changes are clean. bun run build also fails on unrelated env vars (ELIZA_APP_DISCORD_BOT_TOKEN). Use check-types filtered to your files instead.

Project Structure

app/           # Next.js App Router pages
lib/           # Business logic, services
db/
  schemas/     # Drizzle schema definitions
  migrations/  # SQL migration files
  repositories/# Data access layer
components/    # React components
scripts/       # CLI utilities

@claude
Copy link

claude bot commented Mar 16, 2026

Code Review

Overall this is a solid PR: the optimistic-update table, provisioning progress dialog, and page separation are all clear improvements. The admin infrastructure service is well-structured. A few issues worth addressing before merge.


Bugs

1. Underscore-prefixed constants used without underscore in admin-infrastructure.ts

Three constants are declared with a backtick _ backtick prefix but referenced without it inside getAdminInfrastructureSnapshot, which will throw ReferenceError at runtime.

Declared: _NODE_INSPECTION_TIMEOUT_MS, _MAX_CONCURRENT_SSH_SESSIONS, _SNAPSHOT_CACHE_TTL_MS

Referenced as: NODE_INSPECTION_TIMEOUT_MS, MAX_CONCURRENT_SSH_SESSIONS, SNAPSHOT_CACHE_TTL_MS

The unit tests only cover classifyContainerHealth and do not call getAdminInfrastructureSnapshot, so they do not catch this. Drop the _ prefix from the declarations.


Reliability

2. Module-level snapshotCache is unreliable on Vercel

let snapshotCache: { data: AdminInfrastructureSnapshot; expiresAt: number } | null = null;

Vercel serverless functions do not guarantee module reuse across invocations. The cache may never be hit in practice, and concurrent cold starts will each independently fire all SSH connections. Consider moving caching to Redis/KV, or treating the endpoint as always-fresh and documenting the expected latency (~10-25 s per node).

3. onCreated errors are silently swallowed in the dialog

In handleClose, void onCreated?.() silently loses any rejection. If onCreated rejects, the table will not refresh and there is no user feedback. At minimum attach a .catch to log the error.


Code Quality

4. Fragile camelCase / snake_case conversion in mergeApiData

The merge function manually maps agentName -> agent_name, lastHeartbeatAt -> last_heartbeat_at, etc. SandboxListAgent even exposes both spellings (agentName and agent_name), indicating the API shape is inconsistent. A single canonical casing on the API response would eliminate this brittle mapping layer.

5. refreshData fetches an unbounded list

fetch('/api/v1/milady/agents') — as orgs accumulate agents this returns everything on every poll tick and on every action. Worth confirming a default page limit is enforced server-side, or passing one in the query string.

6. Inline pLimit re-implements a maintained package

The 18-line implementation is correct. For a file already at 959 lines, the p-limit package or a shared utility would be easier to audit. Minor note.


Test Coverage

The 5 tests for classifyContainerHealth are well-written. Untested code paths: inspectNodeRuntime and its parsing helpers (pure functions, easy to unit test), getAdminInfrastructureSnapshot, the admin infrastructure API route (auth enforcement, super_admin check), and the two polling hooks.


What's Good

  • Optimistic updates with proper rollback on failure — correct pattern used consistently
  • useSandboxStatusPoll stopping after 5 consecutive errors is thoughtful
  • template-renderer.ts path fix (import.meta.dir vs process.cwd()) is the right Bun idiom
  • agent-budgets.ts fire-and-forget improvement (void to .catch(logger.error)) is strictly better
  • Mobile card layout is a nice addition
  • Status badge/dot system is consistent across all refactored components

Summary: Item 1 (underscore constant mismatch) is a runtime crash on the first admin infrastructure page load — needs to be fixed before merge. Item 2 (serverless cache) is worth at least a code comment. Everything else is a minor quality note.

- Add MILADY_PRICING constants (/bin/bash.02/hr running, /bin/bash.0025/hr idle,  minimum deposit)
- Add billing columns to milady_sandboxes schema (billing_status, last_billed_at, hourly_rate, total_billed, shutdown/warning timestamps)
- Create SQL migration 0052_add_milady_billing_columns
- Add credit gate (checkMiladyCreditGate) enforcing  minimum balance before agent create/provision/resume
- Create hourly milady-billing cron mirroring container-billing pattern:
  - Bills running agents at /bin/bash.02/hr
  - Bills stopped agents with snapshots at /bin/bash.0025/hr
  - Sends 48h shutdown warnings on insufficient credits
  - Auto-suspends agents after grace period
- Add PostHog event types for milady billing analytics
- Register cron in vercel.json (hourly at minute 0)
Previous CI fix incorrectly prefixed NODE_INSPECTION_TIMEOUT_MS,
MAX_CONCURRENT_SSH_SESSIONS, and SNAPSHOT_CACHE_TTL_MS with underscores
but the code still references the unprefixed names.
import.meta.dir is not available in this TypeScript configuration.
Use fileURLToPath(import.meta.url) + path.dirname() instead.
- Remove duplicate local formatRelative function in milady-sandboxes-table
- Fix PostHog EventProperties type to accept Record<string, unknown>
- Extract sandbox status constants to shared module
- Add ambient declaration for @elizaos/plugin-sql/node (createDatabaseAdapter)
- Add ambient declaration for bs58 module
- Add ambient declaration for monaco-editor with IStandaloneCodeEditor
- Fix webkitAudioContext type assertion in audio.ts
- Add ./packages/types/**/*.d.ts to include paths so ambient declarations
  are picked up by the split type-checker
- Exclude **/*.test.ts and **/*.test.tsx to avoid vitest import errors
  in colocated test files
The v1 milady agent create and provision routes now call
checkMiladyCreditGate before proceeding. Tests that exercise these
routes need to mock the billing gate to return { allowed: true }.
@claude
Copy link

claude bot commented Mar 16, 2026

Code Review

Bugs / Issues

🔴 Critical — Billing cron: stale org balance for multi-sandbox orgs
app/api/cron/milady-billing/route.ts (main billing loop)

The cron fetches each org's credit_balance once from dbRead, then processes each sandbox sequentially, mutating the in-memory org.credit_balance after each debit. This is a TOCTOU race: for orgs with multiple sandboxes, the second sandbox reads a stale balance rather than re-querying the DB. Under concurrent cron runs (e.g., a stuck run + a fresh run), double-charging is possible. Recommend using SELECT FOR UPDATE per org, or processing all sandboxes for one org in a single atomic transaction.

🟡 Medium — "exempt" billing status not excluded from query
app/api/cron/milady-billing/route.ts (sandbox query, inArray filter)

The billing query includes sandboxes with billing_status in ["active", "warning", "shutdown_pending"] — but "exempt" is in the MiladyBillingStatus type and is never excluded. Any sandbox manually marked "exempt" would be billed. The "warning" status is also fetched but never set by this cron, so those sandboxes may be double-warned or charged unexpectedly. Recommend explicitly excluding "exempt" from the query and clarifying or removing the unused "warning" state.

🟡 Medium — total_billed precision mismatch
app/api/cron/milady-billing/route.ts, billing update block

total_billed: String(Number(sandbox.total_billed) + hourlyCost),

total_billed is numeric(10,2) but IDLE_HOURLY_RATE is 0.0025 (4 decimal places). PostgreSQL will silently round/truncate the stored value. Either change the column to numeric(10,4) or round before storage.

🟡 Low — handleClose fires onCreated on premature dismissal
packages/ui/src/components/containers/create-milady-sandbox-dialog.tsx

If the user opens the dialog and immediately dismisses it (before submitting), handleClose still calls onCreated?.(), triggering an unnecessary table refresh on the parent. The callback should only fire when an agent was actually created.


Security

🟡 Medium — Admin infrastructure endpoint serializes sensitive topology
app/api/v1/admin/infrastructure/route.ts + packages/lib/services/admin-infrastructure.ts

The API response includes sshPort, sshUser, headscaleIp, bridgeUrl, healthUrl directly in the JSON. While the endpoint is super_admin-gated, a compromised session or a logging system that captures response bodies would leak internal SSH ports, VPN IPs, and service URLs. Consider stripping low-level connection details from the response and keeping them server-side only.

🟡 Low — verifyCronSecret length check leaks timing information
app/api/cron/milady-billing/route.ts, verifyCronSecret

return (
  providedBuffer.length === secretBuffer.length && timingSafeEqual(providedBuffer, secretBuffer)
);

The early-exit on length comparison is a minor timing oracle — it reveals whether the submitted secret has the correct length. Pad both buffers to the same length before calling timingSafeEqual to achieve true constant-time comparison. Low risk given high-entropy secrets, but worth fixing.

🟡 Low — EventProperties escape-hatch defeats discriminated union
packages/lib/analytics/posthog.ts

Adding | Record<string, unknown> to the EventProperties union makes it structurally equivalent to Record<string, unknown>, collapsing all type safety on event payloads. Any object satisfies the type, so mis-typed events won't be caught at compile time.


Performance

🟡 Medium — Serial SSH inspection doesn't scale
packages/lib/services/admin-infrastructure.ts

MAX_CONCURRENT_SSH_SESSIONS = 5 with a 25s timeout per session means best-case latency for 10 nodes is ~50s. The endpoint has no maxDuration export, which risks hitting Vercel's function timeout as the cluster grows. Consider declaring an explicit maxDuration and documenting the node-count ceiling.

🟡 Low — Unbounded sandbox query in admin service
packages/lib/services/admin-infrastructure.ts

The query fetches all sandbox rows with no LIMIT. As the platform scales this could become a large result set. Consider paginating or adding a limit to the admin view.

🟡 Low — In-memory cache is ineffective in Vercel serverless
packages/lib/services/admin-infrastructure.ts

let snapshotCache: { data: AdminInfrastructureSnapshot; expiresAt: number } | null = null;

Module-level cache has a fresh context on every cold start in Vercel serverless. The 30-second TTL will never be hit in production — every request will be a cache miss. Either use a shared cache (Redis/KV) or remove the misleading cache mechanism.


Code Quality

SandboxListAgent interface has dual-format fields
packages/lib/hooks/use-sandbox-status-poll.ts

export interface SandboxListAgent {
  agentName?: string;
  agent_name?: string;
  // ...
}

Supporting both camelCase and snake_case for the same field suggests an inconsistent API response format. Standardize the API response and simplify the interface.

Custom pLimit vs npm package
packages/lib/services/admin-infrastructure.ts

A 15-line custom concurrency limiter was written inline. p-limit is almost certainly already a transitive dependency in this Bun project and is battle-tested. Worth checking before maintaining a custom implementation.

handleRetryProvision doesn't reset poll state
packages/ui/src/components/containers/create-milady-sandbox-dialog.tsx

On retry after provisioning error, only provisionStartTime is reset. The useSandboxStatusPoll hook's error/status state is not re-initialized (since agentId doesn't change), so users retrying may see the previous error state persist briefly.


Positives

  • Optimistic updates are well-implemented — previous state is captured, optimistic mutation applied, and reverted on failure.
  • Polling hooks have solid circuit-breaking: stop after 5 consecutive errors or persistent 4xx, preventing unbounded polling under failure.
  • classifyContainerHealth is a clean pure function with good unit test coverage (5 tests, healthy/unhealthy/missing/warming/stale paths all covered).
  • checkMiladyCreditGate fails closed on DB errors — correct for a billing gate.
  • Page separation (containers vs milady) is a clean architectural improvement.
  • Agent detail page is a net ~150-line reduction with improved clarity.
  • Billing cron mirrors the container billing pattern — consistent with existing conventions.

Summary

The overall structure is solid and follows established patterns. Two issues should be resolved before merge:

  1. Billing TOCTOU / stale balance — potential financial correctness bug for orgs with multiple sandboxes.
  2. "exempt" billing status not excluded — exempt agents would be incorrectly billed.

The sensitive SSH data exposure in the admin API is worth a follow-up discussion on what the dashboard actually renders vs. what it exposes in the response body.

@claude
Copy link

claude bot commented Mar 16, 2026

Eliza Cloud V2

Stack

  • Runtime: Bun
  • Framework: Next.js 15 (App Router)
  • Database: PostgreSQL + Drizzle ORM
  • Deployment: Vercel Serverless
  • UI: React + Tailwind CSS

Commands

bun install          # Install dependencies
bun run dev          # Start dev server
bun run build        # Production build
bun run db:migrate   # Apply database migrations
bun run db:generate  # Generate migration from schema
bun run db:studio    # Open Drizzle Studio

Database Migrations

Never use db:push - it's removed. All schema changes go through migrations.

Schema Change Workflow

  1. Edit schema in db/schemas/
  2. bun run db:generate
  3. Review SQL in db/migrations/
  4. bun run db:migrate
  5. Commit both schema + migration

Custom Migrations

npx drizzle-kit generate --custom --name=descriptive_name

Rules

  • No CREATE INDEX CONCURRENTLY (runs in transaction)
  • Use IF NOT EXISTS / IF EXISTS for creating tables
  • Never edit applied migrations
  • NEVER use omnibus migrations that recreate the full schema or existing objects - they will fail in production by locking active tables. Instead:
    1. Create small targeted migrations that ONLY add your new schema objects
    2. Use separate migrations for data backfills
    3. Put cleanup/drops in their own migration
    4. Group related objects together but limit migrations to <100 lines
  • See docs/database-migrations.md for details

Type Checking

bun run check-types has many pre-existing errors across the codebase (db/, lib/services/, app/). Don't try to fix them all — only verify your changed files have no new errors. Filter output:

bun run check-types 2>&1 | grep -E "(your-file\.ts|your-other-file\.ts)"

If the grep returns empty, your changes are clean. bun run build also fails on unrelated env vars (ELIZA_APP_DISCORD_BOT_TOKEN). Use check-types filtered to your files instead.

Project Structure

app/           # Next.js App Router pages
lib/           # Business logic, services
db/
  schemas/     # Drizzle schema definitions
  migrations/  # SQL migration files
  repositories/# Data access layer
components/    # React components
scripts/       # CLI utilities

@claude
Copy link

claude bot commented Mar 16, 2026

Eliza Cloud V2

Stack

  • Runtime: Bun
  • Framework: Next.js 15 (App Router)
  • Database: PostgreSQL + Drizzle ORM
  • Deployment: Vercel Serverless
  • UI: React + Tailwind CSS

Commands

bun install          # Install dependencies
bun run dev          # Start dev server
bun run build        # Production build
bun run db:migrate   # Apply database migrations
bun run db:generate  # Generate migration from schema
bun run db:studio    # Open Drizzle Studio

Database Migrations

Never use db:push - it's removed. All schema changes go through migrations.

Schema Change Workflow

  1. Edit schema in packages/db/schemas/
  2. bun run db:generate
  3. Review SQL in packages/db/migrations/
  4. bun run db:migrate
  5. Commit both schema + migration

Custom Migrations

npx drizzle-kit generate --custom --name=descriptive_name

Rules

  • No CREATE INDEX CONCURRENTLY (runs in transaction)
  • Use IF NOT EXISTS / IF EXISTS
  • Never edit applied migrations
  • See docs/database-migrations.md for details

Project Structure

app/               # Next.js App Router pages
packages/
  lib/             # Business logic, services
  db/
    schemas/       # Drizzle schema definitions
    migrations/    # SQL migration files
    repositories/  # Data access layer
  components/      # React components
  ui/              # Shared UI component library
  tests/           # Test suites
  types/           # Shared TypeScript generic types
  scripts/         # CLI utilities
  infra/           # Infrastructure logic
  config/          # Shared settings

@claude

This comment was marked as spam.

@claude
Copy link

claude bot commented Mar 16, 2026

PLACEHOLDER_REVIEW_CONTENT

@claude

This comment was marked as spam.

@claude
Copy link

claude bot commented Mar 16, 2026

PR Review — feat: dashboard improvements (admin infra, table reactivity, page separation)

Overall this is a solid PR with good test coverage for the new infrastructure service. The billing cron, credit gate, and table reactivity work are all well-structured. A few issues worth addressing before merge:


🔴 Critical — Billing double-charge risk

File: app/api/cron/milady-billing/route.ts

The runningSandboxes query doesn't guard against re-billing within the same hour. If Vercel retries the cron (e.g. due to a timeout mid-run), or if two invocations overlap, every sandbox in billing_status = 'active' gets charged again:

// No last_billed_at guard here
.where(
  and(
    eq(miladySandboxes.status, "running"),
    inArray(miladySandboxes.billing_status, ["active", "warning", "shutdown_pending"]),
  ),
)

Fix: Add an idempotency check to the query:

or(
  isNull(miladySandboxes.last_billed_at),
  lt(miladySandboxes.last_billed_at, sql`now() - interval '55 minutes'`),
)

55 minutes gives a safe margin for a cron scheduled at 0 * * * *.


🟡 Medium — Stale org balance allows negative credit balances

File: app/api/cron/milady-billing/route.ts (~line 370)

// Update org balance in memory for next sandbox in same org
org.credit_balance = String(result.newBalance);

The result.newBalance comes from the DB RETURNING after the atomic deduction, so the in-memory map stays consistent across sequential sandboxes within a single run. However, currentBalance used in processSandboxBilling is the balance at the start of the transaction, not after. If a concurrent credit spend (e.g. container billing) happens mid-run, the pre-check currentBalance < hourlyCost can pass on a balance that's already been spent elsewhere, resulting in negative balances.

This is hard to fully eliminate without per-org advisory locks, but at minimum the atomic UPDATE … RETURNING already ensures the actual deduction is correct. The risk is only in the insufficient-credits pre-check potentially allowing one extra billing cycle. Low impact, worth noting.


🟡 Medium — EventProperties type safety regression

File: packages/lib/analytics/posthog.ts

| Record<string, unknown>; // "escape hatch for ad-hoc properties"

Adding Record<string, unknown> to the EventProperties union makes the entire union effectively Record<string, unknown>, since every discriminated union member satisfies that type. This removes compile-time enforcement that analytics events use the correct property shapes. If the goal is to allow one-off events, consider a separate overload or a distinct AdHocEventProperties type that's used only where needed.


🟡 Medium — Internal Headscale URL exposed in API response

File: app/api/v1/admin/headscale/route.ts (line ~137)

serverUrl: HEADSCALE_API_URL,

This exposes the internal VPN management server URL in admin API responses. Even though it's super-admin only, it unnecessarily leaks infrastructure topology. If this is needed for the dashboard display, it's fine — just confirm it's intentional.


🟢 Low — HMAC key for cron auth is unnecessarily complex

File: app/api/cron/milady-billing/route.ts

const hmacKey = Buffer.from("milady-billing-cron"); // hardcoded key
const a = createHmac("sha256", hmacKey).update(providedSecret).digest();
const b = createHmac("sha256", hmacKey).update(cronSecret).digest();
return timingSafeEqual(a, b);

This is functionally secure (timing-safe, no length leak), but the HMAC here is being used as a fixed-key hash just to get constant-length outputs for timingSafeEqual — both inputs are the same length naturally since they're secret strings. A simpler equivalent using crypto.scryptSync or just Buffer.from with length equalization would be clearer. Not a bug, just unusual.


🟢 Low — jwtVerify type cast in session-service

File: packages/lib/services/eliza-app/session-service.ts

const sessionPayload = payload as unknown as ElizaAppSessionPayload;

The double cast (as unknown as) is a sign the types don't align. The jwtVerify generic was dropped to work around a type error. Consider keeping the generic and casting only the properties that need it, or filing a follow-up to properly type the JWT payload.


✅ What's good

  • Migration follows the rules perfectly: small, targeted, uses IF NOT EXISTS, no omnibus changes.
  • Credit gate correctly fails closed on DB errors — good defensive choice.
  • agent-budgets.ts — replacing void promise with .catch(logger.error) is a clear improvement.
  • useSandboxStatusPoll — consecutive error counting + per-4xx stop is solid UX.
  • Unit tests for classifyContainerHealth cover all meaningful states.
  • admin-infrastructure.ts — custom pLimit and withTimeout are clean, and the snapshot cache with TTL is appropriate for this admin endpoint.
  • Headscale v0.23+ fallback — graceful degradation to the legacy /machine endpoint is the right approach.

Blocking on the double-billing issue (#1) before merge. The EventProperties regression (#3) is also worth fixing to preserve analytics type safety.

@claude
Copy link

claude bot commented Mar 16, 2026

PR Review test via stdin heredoc

@claude
Copy link

claude bot commented Mar 16, 2026

Test single line - no special chars

@claude
Copy link

claude bot commented Mar 16, 2026

PR Review Part 1 of 4 - CRITICAL ISSUES: (1) Billing cron has no idempotency guard - double-billing risk. File: app/api/cron/milady-billing/route.ts. The query selecting billable sandboxes has no last_billed_at filter. Vercel retries cron jobs on failure, and the endpoint can be triggered manually. If it fires twice within the same hour, every billable org gets charged twice. Fix: add a time guard to WHERE clause using isNull(last_billed_at) OR lt(last_billed_at, now() minus 59 minutes). (2) Credit balance can go negative for multi-sandbox orgs. File: processSandboxBilling(). The pre-check currentBalance < hourlyCost happens outside the transaction with no floor guard. For an org with N sandboxes, if balance covers only one charge but they have two running sandboxes, both pass the pre-check with the same stale in-memory balance, then both transactions execute pushing balance negative. Fix: add gte(organizations.credit_balance, String(hourlyCost)) to the WHERE clause and treat updatedOrg === undefined as a failed pre-check. Items 1 and 2 are real money bugs - fix before shipping to production.

@claude
Copy link

claude bot commented Mar 16, 2026

PR Review Part 2 of 4 - MODERATE ISSUES: (3) warning billing status is dead code. MiladyBillingStatus includes warning and the billing query filters for it, but the billing code never transitions a sandbox to warning. The path is active to shutdown_pending directly. Either remove warning from the type and query, or document when a sandbox enters that state. (4) Non-deterministic warning email recipient. getOrgUserEmail() uses users[0].email but listByOrganization returns users in unspecified order. For multi-member orgs the shutdown warning goes to a random member. This fallback should order by created_at ASC to consistently reach the org owner. (5) No type-check verification mentioned. The PR description notes biome check passes but not bun run check-types. Worth confirming before merge given the new numeric column types and satisfies assertions.

@claude
Copy link

claude bot commented Mar 16, 2026

PR Review Part 3 of 4 - MINOR OBSERVATIONS: (6) hourly_rate DB column diverges from runtime computation. The schema adds an hourly_rate column but processSandboxBilling always computes the rate from constants and never reads sandbox.hourly_rate. If pricing constants change, the stored column value becomes stale. Either use the stored rate or remove the column. (7) In-memory snapshot cache is per-process. The snapshotCache in the admin infrastructure service is module-level. On Vercel each function invocation may be a fresh process, so the 30s TTL may not deduplicate as intended. Low impact but worth a comment. (8) SSH key provenance not visible in the diff. The admin infrastructure service opens SSH connections to Docker nodes but the diff does not show where the private key is sourced. A comment pointing to the required env var would help operators. (9) results.slice(0, 100) silently truncates billing response. Installations with more than 100 billable sandboxes will see a truncated cron response. Log output is untruncated so this is a monitoring UX issue only.

@claude
Copy link

claude bot commented Mar 16, 2026

PR Review Part 4 of 4 - WHAT IS SOLID: Migration uses IF NOT EXISTS throughout - safe to re-run, follows project migration rules, targeted and under 15 lines. HMAC + timingSafeEqual in verifyCronSecret is correct and well-commented. milady-billing-gate failing closed on DB error is the right default. pLimit + withTimeout in the SSH inspection service prevents runaway connections. Optimistic updates with proper rollback in milady-sandboxes-table is a clean pattern. useSandboxListPoll using useRef for callbacks avoids stale closure bugs. Summary: Good overall structure - the table reactivity rewrite is a clear improvement, the admin infra service is well-organized, and the migration follows project rules. Items 1 and 2 (double-billing and negative credit balance) are the critical ones to fix before this ships to production - both are real money bugs.

@claude
Copy link

claude bot commented Mar 16, 2026

PR Review: feat: dashboard improvements

Overall this is a well-structured PR. The billing cron, admin infra service, and UI reactivity work are solid. A few issues worth addressing before merge.

BUGS

  1. formatRelativeShort duplication — app/dashboard/containers/agents/[id]/page.tsx defines its own inline formatRelativeShort that diverges from the exported version in packages/lib/constants/sandbox-status.ts. The constants version should be used everywhere for consistency.

  2. Double-billing risk under concurrent invocations — In milady-billing/route.ts, org balances are fetched upfront then updated in-memory after each billed sandbox. The DB transaction deducts atomically (good), but the pre-transaction sufficiency check uses the in-memory snapshot. If two cron invocations overlap (Vercel cold-start retry or manual POST trigger), both can read the same pre-deduction balance and bill the same hour twice. A last_billed_at recency guard on each sandbox would close this — skip any sandbox billed within the last 50 minutes.

  3. Silent refreshData failures in sandboxes table — In milady-sandboxes-table.tsx the empty catch block in refreshData swallows all network errors without logging. At minimum a console.warn helps debug stale-data reports in production.

SECURITY

  1. MILADY_CLOUD_ROADMAP.md committed to docs/ — This file contains internal pricing strategy, capacity planning targets, and planned features. Confirm this is intentional before merging into a public repo.

PERFORMANCE

  1. Sequential billing loop may exceed 5-minute maxDuration — Each sandbox is processed with sequential await processSandboxBilling(). The admin infra service correctly uses pLimit for SSH inspection, but the billing cron has no batching. At scale (200+ running agents) cumulative DB transaction latency could hit the timeout. Consider Promise.allSettled in batches of 10-20.

MINOR

  1. MILADY_PRICING.SUSPEND_THRESHOLD and LOW_CREDIT_WARNING are unused dead constants — the billing cron only reads RUNNING_HOURLY_RATE, IDLE_HOURLY_RATE, MINIMUM_DEPOSIT, and GRACE_PERIOD_HOURS. Either wire them up or remove them.

  2. Module-level snapshotCache is per-serverless-instance — On Vercel each function instance has isolated memory, so the 30s TTL wont deduplicate across concurrent requests. Not a bug, but worth a comment so future maintainers don't expect cross-request caching.

  3. SandboxListAgent dual camelCase/snake_case fields — agentName and agent_name coexisting plus the [key: string]: unknown index signature is a smell. The list API endpoint should return a consistent casing so the client doesn't need to handle both conventions.

WHAT IS GOOD

  • Atomic SQL transactions for credit deductions — correctly avoids TOCTOU on the debit side
  • HMAC-based timing-safe cron secret comparison
  • pLimit concurrency cap on SSH node inspection
  • Fail-closed billing gate (allowed: false on DB errors, not true)
  • Smart sorted-IDs ref comparison to detect server prop changes without deep equality
  • Optimistic per-sandbox status updates are clean and snappy
  • Migration uses IF NOT EXISTS and only adds new columns — follows project conventions
  • Good unit tests for classifyContainerHealth covering the main state branches
  • jose v4 downgrade with proper KeyLike adapter in session-service.ts and jwks.ts
  • agent-budgets.ts: replacing void promise with .catch(logger.error) is the right pattern

@claude
Copy link

claude bot commented Mar 16, 2026

Code Review for PR 393 - Overall this is solid work. The billing infrastructure is largely done right. A few issues worth addressing.

BUGS:

  1. billing_status 'warning' is a dead state - packages/lib/constants/milady-pricing.ts defines LOW_CREDIT_WARNING and SUSPEND_THRESHOLD but the billing cron never references them. The 'warning' billing_status appears in cron queries but no code path ever sets a sandbox to that state. Agents jump from 'active' directly to 'shutdown_pending'. Either wire up the warning transition at LOW_CREDIT_WARNING threshold, or remove the unused constants and 'warning' from MiladyBillingStatus.

  2. Duplicate formatRelativeShort - defined in both packages/lib/constants/sandbox-status.ts (exported) and as a local function in app/dashboard/containers/agents/[id]/page.tsx with a different null fallback. The page should import from the shared location.

PERFORMANCE:

  1. Sequential billing loop will time out at scale - app/api/cron/milady-billing/route.ts processes sandboxes one-at-a-time, each doing multiple DB round-trips. With maxDuration=300 this starts failing at scale. Consider parallel batches using the pLimit pattern already in admin-infrastructure.ts.

  2. Missing index on milady_sandboxes.last_billed_at - the migration adds an index on billing_status but the cron query also filters heavily on last_billed_at (IS NULL and less-than rebillCutoff). Without an index this is a full table scan at scale. Recommend: CREATE INDEX IF NOT EXISTS milady_sandboxes_last_billed_at_idx ON milady_sandboxes (last_billed_at)

  3. Module-level snapshot cache is ineffective on Vercel - admin-infrastructure.ts uses a module-level snapshotCache with 30s TTL. In Vercel serverless each cold start is a fresh instance so the cache rarely helps. The comment about reducing SSH load is misleading.

SECURITY:

  1. Billing cron response exposes org IDs - the endpoint returns up to 100 BillingResult objects each containing organizationId. While CRON_SECRET protects the endpoint, raw org IDs have no operational value in a cron response. Consider omitting or gating behind a debug param.

SCHEMA:

  1. billing_status has no DB-level constraint - the column is text with only a TypeScript type annotation, no Postgres CHECK constraint. A raw SQL write or typo could silently create an invalid status that the cron query never matches. Consider adding a check constraint in a follow-up migration.

WHAT IS GOOD:

  • Atomic billing tx with rebill-guard pattern prevents double-billing across overlapping cron runs
  • Fail-closed billing gate: if credit check fails, provisioning is blocked
  • Headscale v0.23+ fallback with graceful degradation to legacy endpoint
  • timingSafeEqual for cron auth: proper constant-time HMAC comparison
  • Custom pLimit in admin-infrastructure avoids an unnecessary dependency
  • Billing cron tests are thorough: mock transactions, guard window, shutdown path
  • Page separation (containers/milady) is clean with correct graceful degradation
  • useSandboxListPoll correctly avoids stale closures using refs for callbacks

MINOR:

  • docs/BILLING_AUDIT.md and docs/ONBOARDING_INTEGRATION_PLAN.md are ~1440 lines of planning docs with internal pricing details. Worth deciding if these belong in the repo or a wiki.
  • The jose downgrade (v6 to v4 in bun.lock) is a significant version change. Worth confirming it was intentional and that JWKS behavior was validated.

@claude
Copy link

claude bot commented Mar 16, 2026

Code Review

Overall this is a solid, well-structured PR. The billing infrastructure is modelled closely after the existing containers billing pattern, the migration follows CLAUDE.md rules, and the optimistic-update table rewrite fixes a real UX bug. A few issues worth addressing before merging:


🐛 Potential Bugs

1. MiladyBillingStatus includes "warning" but it's never assigned

packages/db/schemas/milady-sandboxes.ts:25 defines "warning" in the union, but the billing cron only ever sets "active", "shutdown_pending", or "suspended". LOW_CREDIT_WARNING and SUSPEND_THRESHOLD from milady-pricing.ts are also defined but never referenced. Either wire up pre-warning logic or remove the dead values to avoid future confusion.

2. snapshotCache (module-level singleton) won't survive between Vercel invocations

packages/lib/services/admin-infrastructure.ts stores an in-memory cache at module scope with a 30-second TTL. On Vercel serverless, each cold invocation starts a fresh module, so the cache will always be empty in practice. The cache only helps if the same container instance receives two requests within 30 s, which is unlikely for an admin-only endpoint. Either remove the cache or document the limitation so a future Redis/KV layer isn't surprising.

3. mergeApiData accesses agent.agent_name on a camelCase type

In milady-sandboxes-table.tsx (~line 200):

agent_name: agent.agentName ?? agent.agent_name ?? existing?.agent_name ?? null,

SandboxListAgent is a typed camelCase interface — agent.agent_name is always undefined at runtime. The fallback is harmless but signals a type mismatch. Drop the middle term.

4. Billing cron doesn't reset billing_status to "active" after a successful charge on a "suspended" agent

If an org tops up credits but their agent is in "suspended" status, the next billing run will attempt to charge and succeed, but the sandbox is already marked stopped with sandbox_id: null. The billing transaction will succeed (deducting credits) without re-provisioning the agent. The user loses money with no visible change. Consider either skipping billing for "suspended" agents or making suspension reversible via the resume flow.


⚠️ Minor Issues

5. HEARTBEAT_WARNING_MINUTES = 5 is defined but unused

admin-infrastructure.ts:9 defines this constant, but only HEARTBEAT_STALE_MINUTES = 15 is actually checked in classifyContainerHealth. If a two-tier warning/stale system is planned, add a TODO. Otherwise remove it.

6. Misleading comment in milady-pricing.ts

The file header says "billed via a daily cron" but the cron schedule in vercel.json is 0 * * * * (hourly). Minor but confusing for anyone reading the constants.

7. Email template says "container" for Milady agents

sendContainerShutdownWarningEmail is reused with containerName: \Milady Agent: ${agentName}`` — acceptable workaround, but the email subject line likely still says "container." If users are confused by receiving a "container shutdown" warning for their agent, a Milady-specific template (or at least a subject-line override) would help.

8. tsconfig.app.chat.temp.json committed

The filename signals a scratch/temp file. If it's needed for a build artifact, rename it. Otherwise remove it.

9. billing_status column is untyped at the database level

The schema uses text("billing_status").$type<MiladyBillingStatus>() — the TypeScript type is enforced but the DB has no check constraint. An invalid status string (e.g. from a manual SQL edit or a future migration bug) would pass silently. Consider a pgEnum or a CHECK constraint to match the containers table pattern.


✅ What's well done

  • Atomic billing transaction — the rebill guard + balance floor in SQL prevents double-billing and race conditions correctly.
  • Timing-safe cron auth — HMAC wrapping avoids leaking secret length via timingSafeEqual.
  • Fail-closed credit gatecheckMiladyCreditGate returns allowed: false on any DB error, correct behavior.
  • Migration hygieneIF NOT EXISTS throughout, single-purpose file, under 100 lines. ✓
  • pLimit implementation — correct for single-threaded JS async; the shared-closure increment is safe.
  • Optimistic table updates — the mergeApiData / setLocalSandboxes pattern is clean and the re-sync on prop change handles page navigation correctly.
  • Test coverageclassifyContainerHealth unit tests cover all five meaningful paths.

@claude
Copy link

claude bot commented Mar 16, 2026

PR #393 Review — Dashboard Improvements: Admin Infra, Table Reactivity, Page Separation

Overall this is a solid, well-structured PR. The billing infrastructure is particularly well thought-out with proper transaction isolation and a rebill guard. A few issues worth addressing before merge:


🐛 Potential Bugs

1. last_backup_at column missing from migration

app/api/cron/milady-billing/route.ts queries isNotNull(miladySandboxes.last_backup_at) for idle-billing stopped sandboxes, but last_backup_at is not added in 0052_add_milady_billing_columns.sql and doesn't appear in the schema diff. If this column doesn't already exist in production, the billing cron will throw a runtime error for every stopped agent.

Please verify this column exists, and if not, add it to the migration.

2. Duplicate formatRelativeShort function

formatRelativeShort is defined twice — once as a local private function in app/dashboard/containers/agents/[id]/page.tsx and again as the exported version in packages/lib/constants/sandbox-status.ts. The page file should import from the shared constants file instead.

3. jose downgrade from v6 → v4

package.json pins jose to ^4.15.0 and the lockfile moves to jose@4.15.9. The session-service fixes (setExpirationTime(Math.floor(expiresAt.getTime() / 1000)) and removing the generic type from jwtVerify) look like they were needed to work around v4 API differences. This is fine as a targeted fix, but jose v4 is behind v6 by several major versions. Consider leaving a comment explaining this constraint to prevent someone from "upgrading" it back and reintroducing the bug.


⚠️ Design/Quality Issues

4. In-memory snapshot cache won't work on Vercel

packages/lib/services/admin-infrastructure.ts uses a module-level snapshotCache variable with a 30-second TTL. On Vercel Serverless, each invocation may get a cold-start so the cache will always be empty. The comment acknowledges this ("Serverless cold starts will begin with an empty cache") but the actual benefit is near-zero in production. Either:

  • Accept the no-caching behaviour and remove the cache code to reduce complexity, or
  • Move to a Redis/KV-backed cache if the SSH inspection latency is a real concern.

5. Cron response exposes PII in the success body

handleMiladyBilling returns up to 100 billing result entries in the response JSON, including sandboxId, organizationId, and agentName. While protected by CRON_SECRET, this adds unnecessary surface area. Vercel cron logs are retained and accessible to team members. Consider dropping results from the response body and relying on server logs for debugging.

6. In-memory org balance update can cause double-warnings within a single cron run

In handleMiladyBilling, after a successful billing:

org.credit_balance = String(result.newBalance);

But if a billing fails (e.g. InsufficientCreditsDuringBillingError), the org object in orgMap still holds the pre-transaction balance. When a second sandbox for the same org is processed, it may incorrectly pass the currentBalance >= hourlyCost check inside queueShutdownWarning() because getOrgBalance() is called separately there. This is unlikely to cause real financial harm but could result in a warning being skipped for an org that should receive one.


✅ Things Done Well

  • Transaction isolation in billing: The UPDATE … WHERE last_billed_at < rebillCutoff pattern that claims the row before deducting credits is correct and prevents double-billing from concurrent cron runs.
  • Fail-closed credit gate: checkMiladyCreditGate returns allowed: false on DB errors — the right default for a billing gate.
  • Migration quality: Small, targeted migration following the CLAUDE.md rules. Uses IF NOT EXISTS and ADD COLUMN IF NOT EXISTS. ✓
  • Table reactivity rewrite: The mergeApiData / useSandboxListPoll approach is clean — preserves server-side-only fields while live-updating status from the client.
  • Headscale API version fallback: The /api/v1/node/api/v1/machine fallback with 404 detection is a pragmatic compatibility fix.
  • void.catch() in agent-budgets: Replacing void this.triggerAutoRefill() with .catch(err => logger.error()) is the correct pattern for fire-and-forget with error visibility.
  • AuthenticationError/ForbiddenError handling added to admin routes: Consistent with the rest of the API surface.

Minor

  • packages/lib/constants/sandbox-status.ts defines both formatRelative and formatRelativeShort which are nearly identical except for the "—" vs "Never" empty-date default and the last line. Consider consolidating with an options parameter.
  • The custom pLimit implementation in admin-infrastructure.ts is correct, but the existing p-limit package (already in use elsewhere) would reduce maintenance surface.

Copy link
Member

@lalalune lalalune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the follow-up fixes and verified the latest CI run is green.

@lalalune lalalune merged commit c39d439 into dev Mar 16, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants